Stability and Diversity in Collective Adaptation 
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We derive a class of macroscopic differential equations that describe collective adaptation, start- 
ing from a discrete-time stochastic microscopic model. The behavior of each agent is a dynamic 
balance between adaptation that locally achieves the best action and memory loss that leads to 
randomized behavior. We show that, although individual agents interact with their environment 
and other agents in a purely self-interested way, macroscopic behavior can be interpreted as game 
dynamics. Application to several familiar, explicit game interactions shows that the adaptation 
dynamics exhibits a diversity of collective behaviors. The simplicity of the assumptions underlying 
the macroscopic equations suggests that these behaviors should be expected broadly in collective 
adaptation. We also analyze the adaptation dynamics from an information-theoretic viewpoint and 
discuss self-organization induced by information flux between agents, giving a novel view of collective 
adaptation. 
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I. INTRODUCTION 

Collective behavior in groups of adaptive systems is 
an important and cross-cutting topic that appears under 
various guises in many fields, including biology, neuro- 
sciences, computer science, and social science. In all these 
adaptive systems, individual agents interact with one an- 
other and modify their behaviors according to the infor- 
mation they receive through those interactions. Often, 
though, collective behaviors emerge that are beyond the 
individual agent's perceptual capabilities and that some- 
times frustrate satisfying the local goals. With compet- 
itive interactions dynamic adaptation can produce rich 
and unexpected behaviors. This kind of mutual adap- 
tation has been discussed, for example, in studies of bi- 
ological group interaction 0, |3j 0] i interactive learning 
0, H, 0] , large-scale adaptive systems 0, B 1 ^^'^ learn- 
ing in games 0, 0| . 

Here we develop a class of coupled differential equa- 
tions for mutual adaptation in agent collectives — systems 
in which agents learn how to act in their environment 
and with other agents through reinforcement of their ac- 
tions. We show that the adaptive behavior in agent col- 
lectives, in special cases, reduces to a generalized form 
of multipopulation replicator equations and, generally, 
can be viewed as a kind of information-theoretic self- 
organization in a collective adaptive system. 

Suppose that many agents interact with an environ- 
ment and each independently attempts to adjust its be- 
havior to the environment based on its sensory stimuli. 
The environment consists of other agents and other ex- 
ogenous influences. The agents could be humans, ani- 
mals, or machines, but we make no assumptions about 
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their detailed internal structures. That is, the central 
hypothesis in the following is that collective adaptation 
is a dynamical behavior driven by agents' environment- 
mediated interactions. By separating the time scales of 
change in the environment, of agents' adaptation, and of 
agent-agent interactions, our models describe, not the 
deterministic decision-making itself, but the temporal 
change in the probability distribution of choices. 



A. Related Work 

This approach should be compared and contrasted 
with game theoretic view First, classical game 

theory often assumes that players have knowledge of 
the entire environmental structure and of other players' 
decision-making processes. Our adaptive agents, how- 
ever, have no knowledge of a game in which they might 
be playing. Thus, unlike classical game theory, in our 
setting there is no bird's eye view for the entire collec- 
tive that is available to the agents. Agents have only 
a myopic model of the environment, since any informa- 
tion external to them is given implicitly via the rein- 
forcements for their action choices. Second, although we 
employ game-theoretic concepts such as Nash equilibria, 
we focus almost exclusively on dynamics — transients, at- 
tractors, and so on — of collective adaptation, while, natu- 
rally, making contact with the statics familiar from game 
theory. Finally, despite the differences, game structures 
can be introduced as a set of parameters corresponding 
to approximated static environments. 

While replicator dynamics were introduced originally 
for evolutionary game theory [H El Il5l| . the relation- 
ship between learning with reinforcement and replica- 
tor equations has been discussed only recently ^lO., 1^. 
Briefly stated, in our model the state space represents 
an individual agent's probability distribution to choose 
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actions and the adaptation equations describe the tem- 
poral evolution of choice probabilities as the agents in- 
teract. Here, we extend these considerations to collective 
adaptation, introducing the theory behind a previously 
reported model ,16., JJJ. The overall approach, though, 
establishes a general framework for dynamical-systems 
modeling and analysis of adaptive behavior in collectives. 
It is important to emphasize that our framework goes be- 
yond the multipopulation replicator equations and asym- 
metric game dynamics since it does not require a static 
environment (cf. Ref. [lMll9l | for dvnamic environments) 
and it includes the key element of the temporal loss of 
memory. 

We model adaptation in terms of the distribution of 
agents' choices, developing a set of differential equa- 
tions that are a continuous-time limit of a discrete-time 
stochastic process; cf. Ref. |23|. We spend some time 
discussing the origin of action probabilities, since this is 
necessary to understand the model variables and also to 
clarify the limits that we invoke to arrive at our model. 
One is tempted to give a game-theoretic interpretation of 
the model and its development. For example, the mixed 
strategies in game play are often interpreted as weights 
over all (complete plans of) actions. However, the game- 
theoretic view is inappropriate for analyzing local, my- 
opic adaptation and the time evolution of collective be- 
havior. 

Another interpretation of our use of action probabili- 
ties comes from regarding them as frequencies of action 
choices. In this view, one needs long-time trials so that 
the frequencies take on statistical validity for an agent. 
Short of this, they would be dominated by fluctuations, 
due to undersampling. In particular, one requires that 
stable limit distributions exist. Moreover, the underlying 
deterministic dynamics of adaptation should be ergodic 
and have strong mixing properties. Finally, considering 
agent-agent interactions, one needs to assume that their 
adaptation is very slow compared to interaction dynam- 
ics. For rapid, say, real-time adaptation, these assump- 
tions would be invalid. Nonetheless, they are appropriate 
for long-term reinforcement, as found in learning motion 
through iterated exercise and learning customs through 
social interaction. 



B. Synopsis 

The approach we take is ultimately phenomenological. 
We are reminded of the reaction-diffusion models of bio- 
logical morphogenesis introduced originally in Ref. |2lj |. 
There, the detailed processes of biological development 
and pattern formation were abstracted, since their bio- 
chemical basis was (and still is) largely unknown, and a 
behavioral phenomenology was developed on this basis. 
Similarly, we abstract the detailed and unknown percep- 
tual processes that underlie agent adaptation and con- 
struct a phenomenology that captures adaptive behavior 
at a larger scale, in agent collectives. 



The phenomenology that we develop for this is one 
based on communications systems. Agents in a collec- 
tive are confronted with the same three problems of com- 
munication posed by Weaver in the founding work of 
information theory — The Mathematical Theory of Com- 
munication 22] : (a) "How accurately can the symbols of 
communication be transmitted?" , (b) "How precisely do 
the transmitted symbols convey the desired meaning?" 
and (c) "How effectively does the received meaning affect 
conduct in the desired way?". Shannon solved the first 
pro blem developing his theory of error-free transmission 
22] . In their vocabulary adaptive agents are information 
sources. Each (a) receives information transmitted from 
the external environment, which includes other agents, 
(b) interprets the received information and modifies its 
internal model accordingly, and then, (c) making deci- 
sions based on the internal model, generates future be- 
havior. 

We will show that this information-theoretic view pro- 
vides useful tools for analyzing collective adaptation and 
also an appropriate description for our assumed fre- 
quency dynamics. Using these we derive a new state 
space based on the self- informations of agent's actions 
and this allows one to investigate the dynamics of un- 
certainty in collective adaptation. It will become clear, 
though, that the assumption of global information maxi- 
mization has limited relevance here, even for simple mu- 
tual adaptation in a static environment. Instead, self- 
organization that derive from the information flux be- 
tween agents gives us a new view of collective adaptation. 

To illustrate collective adaptation, we present several 
simulations of example environments; in particular, those 
having frustrated agent-agent interactions [2^. Inter- 
estingly, for two agents with perfect memory interacting 
via zero-sum rock-scissors-paper interactions the dynam- 
ics exhibits Hamiltonian chaos With memory loss, 
though, the dynamics becomes dissipative and displays 
the full range of nonlinear dynamical behaviors, includ- 
ing limit cycles, intermittency, and deterministic chaos 

m 

The examples illustrate that Nash equilibria often 
plays little or no role in collective adaptation. They are 
fixed points determined by the intersections of nuUclines 
of the adaptation dynamics and sometimes the dynamics 
is explicitly excluded from reaching Nash equilibria, even 
asymptotically. Rather, it turns out that the network 
describing the switching between deterministic actions is 
a dominant factor in structuring the state-space flows. 
From it, much of the dynamics, including the origins of 
chaos becomes intuitively clear. 

In the next section (Sec. we develop a dynamical 
system that models adaptive behavior in collectives. In 
Sec. mil we introduce an information-theoretic view and 
coordinate-transformation for adaptation dynamics and 
discuss self-organization induced by information flux. To 
illustrate the rich range of behaviors, in the Sec. IIVI we 
give several examples of adaptive dynamics based on non- 
transitive interactions. Finally, in Sec. we interpret 
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our results and suggest future directions. 



II. DYNAMICS FOR COLLECTIVE 
ADAPTATION 

Before developing the full equations for a collective of 
adaptive agents, it is helpful to first describe the dy- 
namics of how an individual agent adapts to the con- 
straints imposed by its environment using the memory 
of its past behaviors. We then build up a description of 
how multiple agents interact, focusing only on the addi- 
tional features that come from interaction. The result is 
a set of coupled differential equations that determine the 
behavior of adaptive agent collectives and are amenable 
to various kinds geometric, statistical, and information- 
theoretic analyses. 



A. Individual Agent Adaptation 

Here we develop a continuous-time model for adap- 
tation in an environment with a single adaptive agent. 
Although the behavior in this case is relatively simple, 
the single-agent case allows us to explain several basic 
points about dynamic adaptation, without the compli- 
cations of a collective and agent-agent interactions. In 
particular, we discuss how and why we go from a discrete- 
time stochastic process to a continuous-time limit. We 
also describe an agent's effective internal model of the 
environment and how we model its adaptation process 
via a probability distribution of action choices. 

An agent takes one of N possible actions: i = 
1,2,...,N at each time step r. Let the probability 
for the agent to chose action i be Xi{T), where r is 
the number of steps from the initial state Xi{0). The 
agent's state vector — its choice distribution — at time t is 
x(t) = {xi{t),X2{t), . . .,xn{t)), where S]^^ia;„(r) = 1. 
In the following we call the temporal behavior of x(t) as 
the dynamics of adaptation. 

Let ri{T) denote the reinforcement the agent receives 
for its taking action i at step r. Denote the collection 
of these by the vector r(r) = (ri(T), . . . , rAr(r)). The 
agent's memories — denoted Q(t) — (Qi(t), . . . , Qn{t)) 
— of past rewards from its actions are updated according 
to 
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Q^{r + 1) - g.(T) = - [5dr)r,{T) - aQ,(r)] , (1) 



where 



1, action i chosen at step r 
0, otherwise 



(2) 



with i = 1,...,N and Qi{0) — 0. T is a constant 
that sets the agent-environment interaction time scale. 
a e [0, 1) controls the agent's memory loss rate. For 
a = 0, the agent has a perfect memory as the sum of the 



past reinforcements; for a > the memory is attenuated 
in that older reinforcements have less effect on the cur- 
rent QiS and more recent reinforcements are given larger 
weight. One imagines that the agent constructs a his- 
togram of past reinforcements and this serves as a simple 
internal memory of its environment. 

An agent chooses its next action according to its choice 
distribution which is updated from the reinforcement 
memory according to: 



Xtir) 



(3) 



where i = 1, 2, . . . , TV. /3 S [0, oo] controls the adapta- 
tion rate: how much the choice distribution is changed 
by the memory of past reinforcements. For example, if 
/? = 0, the choice distribution is unaffected by past re- 
inforcements. Specifically, it becomes independent of Q 
and one has ^^(t) = In this case, the agent chooses 

actions with uniform probability and so behaves com- 
pletely randomly. In a complementary fashion, in the 
limit /3 — > oo, an agent chooses that action i with the 
maximum Qiir) and Xi(r) — s- 1. 

Given Eq. JSJ the time evolution of agent's choice dis- 
tribution is: 



x,{t + 1) = 



Xi{T)e' 



^3{Q^{r+l)-Qi(T)) 



(4) 



where i = 1,2, . . . , N . This determines how the agent 
adapts its choice distribution using reinforcements it has 
received from the environment for its past actions. 

This simple kind of adaptation was introduced as a 
principle of behavioral learning '2?, '25] and as a model 
of stochastic learning i2&|. and is sometimes referred to as 
reinforcement learning |27l l2^ . Arguably, it is the sim- 
plest form of adaptation in which an agent develops re- 
lationships or behavior patterns through reinforcements 
from external stimuli. 

Starting with the discrete-time model above, one can 
develop a continuous-time model that corresponds to the 
agent performing a large number of actions, iterates of 
Eq. for each choice distribution update, iterate of Eq. 
©. Thus, we recognize two different time scales: one for 
agent-environment interactions and one for adaptation of 
the agent's internal model based on its internal memory. 
We assume that the adaptation dynamics is very slow 
compared to interactions and so x is essentially constant 
during interactions. (See Fig. ^) 

Starting from Eq. Q, one can show that the 
continuous- time dynamics of memory updates is given 
by the differential equations 



Q,{t) ^ R,it) ^ aQ,{t) 



(5) 



with i = 1,2, ...,7V and Qi{0) = 0. (see App. El) 
Here i?j is the reward the environment gives to the agent 
choosing action i: the average of riir) during the time 
interval between updates of x at t and t + dt. 
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FIG. 1: The time scale (t) of a single agent interacting with its 
environment and the time scale (r) of the agent's adaptation: 



From Eq. ^ one sees that the map from Q(i) to x(t) 
at time t is given by 



Xi{t) 



(6) 



where i — 1,2, . . . , N. Differentiating Eq. ((HJ gives the 
continuous-time dynamics 



N 



^it) = (3x,{t){Q,{t) - ^ Qn{t)Xn{t)) 



(7) 



with i^\,2,...,N. 



Assembling Eqs. (0), lO, and 0, one finds the basic 
dynamic that governs agent behavior on the adaptation 
time-scale: 



— = P{R, -R) + a{H, - H) , 



where z = 1, 2, . . . , A^. Here 



N 



R — ^ ^ XfiRji 



(8) 



(9) 



is the net reinforcement averaged over the agent's possi- 
ble actions. And, 



Hi = - log Xi 



(10) 



where i = 1,2, . . . , N, is the self-information or degree 
of surprise when the agent takes action i j22l |. The av- 
erage self-information, or Shannon entropy of the choice 
distribution, also appears as 



N 



N 



(11) 



These are the entropies of the agent's choice distribution 
measured, not in bits (binary digits), but in nats (natural 
digits), since the natural logarithm is used. The entropy 
measures the choice distribution's flatness, being maxi- 
mized when the choices all have equal probability. 



Fortunately, the basic dynamic captured by Eq. © 
is quite intuitive, being the balance of two terms on the 
right-hand side. The first term describes an adaptation 
dynamic, whose time scale is controlled by /?. The second 
describes the loss of memory with a time scale controlled 
by a. That is, the adaptation in choice probabilities is 
driven by a balance between two forces: the tendency to 
concentrate the choice probability based on the reinforce- 
ment R = (i?i, i?2, . . . , Rn) and the tendency to make 
choices equally likely. Finally, on the lefthand side, one 
has the logarithmic derivative of the choice probabilities: 
Xi/xi = d/dt (logXi). 

Note that each of the terms on the righthand side is 
a difference between a function of a particular choice 
and that function's average. Specifically, the first term 
ARi = Ri — R is the relative benefit in choosing action 
i compared to the mean reinforcement across all choices. 
Other things being held constant, if this term is positive, 
then action i is the better choice compared to the mean 
and Xi will increase. The second term AHi = Hi — H is 
the relative informativeness of taking action i compared 
to the average H, that is Shannon entropy. Thus, Xi 
decreases in proportion to the entropy at time t and so 
this term works to increase the uncertainty of agent's ac- 
tions, flattening the choice distribution by increasing the 
probability of unlikely actions. When Xi — N^^, the dis- 
tribution is fiat (purely random choices), AH — 0, and 
memory loss effects disappear. 

Mathematically, the adaptation equations have quite 
a bit of structure and this has important consequences, 
as we will see. Summarizing, the adaptation equations 
describe a dynamic that balances the tendency to concen- 
trate on choices associated with the best action against 
the tendency to make the choices equally likely. The net 
result is to increase the choice uncertainty, subject to 
the constraints imposed by the environment via the re- 
inforcements. Thus, the choice distribution is the least 
biased distribution consistent with environmental con- 
straints and individual memory loss. We will return to 
discuss this mechanism in detail using information theory 
in the Sec. UTTI 



Adaptation 
Memory Loss 



Probability 
Distribution 




I I I I I I I I I 



Action 



FIG. 2: A dynamic balance of adaptation and memory loss: 
Adaptation concentrates the probability distribution on the 
best action. Memory loss of past history leads to a distribu- 
tion that is flatter and has higher entropy. 

Since the reinforcement determines the agent's interac- 
tions with the environment, there are, in fact, three dif- 
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ferent time scales operating: that for agent-environment 
interactions, that for each agent's adaptation, and that 
for changes to the environment. However, if the environ- 
ment changes very sfow compared to the agent's internal 
adaptation, the environment r.i (t) can be regarded as ef- 
fectively constant, as shown in Fig. |2| 



Environment 

t' 




memory case (a = 0), the choice distribution converges 
to a stable fixed point (0,0, 1). x* — (i, i, i) is an un- 
stable hyperbolic fixed point. In the memory loss case 
(a > 0), dynamics converges a stable fixed point inside 
the simplex. (These cases are illustrated in Fig. 0]) 



2 2 





Interaction 

*• T 



FIG. 3: The time scales of dynamic adaptation: Agent adap- 
tation is slow compared to agent-environment interaction and 
environmental change is slower still compared to adaptation. 

In this case ri{t) can be approximated as a static 
relationship between an agent's actions and the rein- 
forcements given by the environment. Let ri{t) = a^, 
where a — (ai, . . . , a^) are constants that are normal- 
ized: Yi^^iUn = 0. Given this, the agent's time-average 
reinforcements are (i?^ = ai) and the continuous-time 
dynamic simplifies to: 

N N 

— = /3(ai-^ a„x„)+a(- logXi-|-^ x„ loga;„) , (12) 

•^i -I T 

n— 1 n—1 

where i = 1,2, . . . , N. 

The behavior of single-agent adaptation given by Eq. 
(|12|l is very simple. When a is small, so that adapta- 
tion is dominant Xi 1, where i is the action with the 
highest reward Ui, and Xj ~f for j ^ i. The agent re- 
ceives this information from the fixed environment and 
its behavior is simply to choose the action with the max- 
imum reward and the choice distribution moves to the 

associated simplex vertex x* — (0, . . . , 1^, . . . , 0). In 
the special case when a = 0, it is known that for ar- 
bitrary a Eq. (|12|l moves x to the vertex corresponding 
to the maximum Ui 0- In a complementary way, when 
a is large enough to overcome the relative differences 
in reinforcements — that is, when (3/a memory loss 
dominates, the agent states goes to a uniform choice dis- 
tribution {xi = N^^) and the system converges to the 
simplex center. Note that in machine learning this bal- 
ance between local optimization and randomized behav- 
ior, which selects non-optimal actions, is referred to as 
the exploitation- exploration trade-off |28l |. 

For instance, consider an agent that takes = 3 
actions, {1,2,3}, in an environment described by a = 
(|e, — 1 — |e, 1 — |e), with e G [—1,1]. In the perfect 



FIG. 4: Dynamics of single-agent adaptation: Here there are 
three actions, labeled 1, 2, and 3, and the environment gives 
reinforcements according to a = (|£, —1 — |e, 1 — ie). The 
figure shows two trajectories from simulations with e = 0.5 
and f3 = 0.1 and with a = 0.0 (right) and a = 0.3 (left). 

Even when the environment is time-dependent, the 
agent's behavior can track the highest-reward action as 
long as the time scale of environment change is slow com- 
pared to the agent's adaptation. However, the situation 
is more interesting when environment change occurs at 
a rate near the time-scale set by adaptation. Mutual 
adaptation in agent collectives, the subject of the fol- 
lowing sections, corresponds to just this situation. Other 
agents provide, thought their own adaptation, a dynamic 
environment to any given agent and if their times scales 
of adaptation are close the dynamics can be quite rich 
and difficult to predict and analyze. 

B. Two Agent Adaptation 

To develop equations of motion for adaptation in an 
agent collective we initially assume, for simplicity, that 
there are only two agents. The agents, denoted X and Y, 
at each moment take one of iV or M actions, respectively. 
The agents states at time t are x — {xi, . . . , xn) and 
y = (2/1, ■ • ■ ,?/m), with T^f^^^Xn = S*^=iym = 1- x(0) 
and y(0) are the initial conditions. We view the time 
evolution of each agent's state vector in the simplices 
X £ Ax and y € Ay and the group dynamics in the 
collective state space A which is the product of the agent 
simplices: 

X = (x, y) e A = Ax X Af . (13) 

There are again three different time scales to consider: 
one for agent-agent interaction, one for each agent's in- 
ternal adaptation, and one for the environment which 
now mediates agent interactions via the reinforcements 
given to the agents. Here we distinguish between the 
global environment experienced by the agents and the 
external environment, which is the global environment 
with the agent states removed. The external environ- 
ment controls, for example, the degree of coupling be- 
tween the agents. In contrast with the single-agent case. 
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in the many agent setting each agent's behavior produces 
a dynamic global environment for the other. This en- 
vironment dynamics is particularly important when the 
adaptation time scales of each agent are close. 

Following the single-agent case, though, we assume 
that the adaptation dynamic is very slow compared to 
that of agent-agent interactions and that the dynamics 
of the external environment changes very slowly com- 
pared to that of agents' mutual adaptation. Under these 
assumptions the agent state vectors x and y are effec- 
tively constant during the agent-agent interactions that 
occur between adaptation updates. The immediate con- 
sequence is that can describes the collective state space 
in terms of the frequencies of actions (the choice dis- 
tributions). Additionally, the environment is essentially 
constant relative to changes in the states x and y. 



Denote the agents' memories by = {Qi^ ■ ■ ■ , Qn) 
for X and Q'^ = (Qf , . . . , Q^) for Y and set Qf (0) = 
and Qj'(O) = 0, for for i = 1, . . . , iV and J = 1, . . . , M. 
For the dynamic governing memory updates we have 



(T+l)-Qf(T) 



^[5^Ar)4(T)~axQf{T)\ , 



)f(r + l)-Of(r) 



7^[5^Ar)rl{T)-aYQ]{T)\ 



(14) 



adaptation for two agents is governed by 

N 



ii — PX'^iiQi ^ ^ Qn ^n) ^ 
n=l 
M 



mVrn) i 



(18) 



for i = 1, . . . , and j = 1, . . . , M . 

Putting together Eqs. (^71), and one finds 

the coupled adaptation equations for two agents: 



Xi 

ii 



X 



tX^ 



(19) 



for i = 1, . . . , iV and j = 1, . . . , M and where 



N 



M 



n—l m— 1 

N M 



where 



1, pair of actions (i,j) chosen at step r 
0, otherwise 



(15) 

with i = 1,...,N, j = 1,...,M and Qf(0) = 0, 
QJ {0} — 0- T is a time constant. Then the continuous- 
time dynamics of memory updates for X and Y are given 
by the differential equations 



Qf" = i?f — axQ 
QJ = RJ — uyQ] 



:>X 



X 



(16) 



for i = 1, 2, . . . , iV and j = 1, 2, . . . , M. Rf is the reward 
for agent X choosing action i, averaged over agent F's 
actions between adaptive updates; and Rj is F's. The 
parameters ax, cey G [0, 1) control each agent's memory 
loss rate, respectively. 

The map from (t) to x(t) and from Q^(i) to y{t) 
at time t is 



Xi{t) 

Vjit) 



'it) 



(17) 



for i = 1, . . . , iV and j = 1, . . . ,M. Here Px,I3y e [0, oo] 
control the agents' adaptation rates, respectively. Differ- 
entiating Eq. H17|l with respect to t, the continuous-time 



The interpretations of the AR — Ri — R and A_ff — 
Hi ~ H terms are not essentially different from those 
introduced to describe the single-agent case. That is, 
the behavior of each agent is a dynamic balance between 
(i) adaptation: concentrating the choice probability on 
the best action at t and (ii) memory loss: increasing the 
choice uncertainty. What is new here is that there are 
two (and eventually more) agents attempting to achieve 
this balance together using information that comes from 
their interactions with the global environment. 

As given, the adaptation equations include the possi- 
bility of a time-dependent environment, which would be 
implemented, say, using a time-dependent reinforcement 
scheme. However, as with the single-agent case, it is 
helpful to simplify the model by assuming a static exter- 
nal environment and, in particular, static relationships 
between the agents. 

Assume that the external environment changes slowly 
compared to the dynamics of mutual adaptation, as illus- 
trated in Fig. 13 This implies a nearly static relationship 
between pairs of action choices and reinforcements 
and rj^ for both agents. Since the environmental dy- 
namics is very slow compared to each agents' adaptation, 

{t) and rj^ (t) are essentially constant during adapta- 
tion. The rs can be approximated then as constant: 



(21) 



for i = 1, . . . , N and j — 1, . . . , M. atj and bji are nor- 
malized over j and i so that when summing over all ac- 
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tions the reinforcements vanish: 

N 



n=l 
M 

J2 ^rm = . 



(22) 



m— 1 



Given the form of AR in the adaptation equations, this 
normahzation does not affect the dynamics. 

Assmne further that x and y are independently dis- 
tributed. This is equivalent to agents never having a 
global view of the collective or their interactions with 
the environment (other agents). Each agent's knowledge 
of the environment is uncorrelated, at each moment, with 
the state of the other agents. The time-average rewards 
for X and Y now become 

M 

m— 1 

N 



(23) 



71=1 



for i — 1, . . . ,N and j — 1, . . . , M. In this restricted 
case, the continuous-time dynamic is given by the cou- 
pled adaptation equations 

^ = /3x[(Ay),-x.Ay] 

N 



/3y[(i3x),-yi?x] 



M 



ay 



[- log Vj + X! Vm] ■ (24) 



for i = 1, . . . , A'' and j = 1, . . . , M. A is an x M 
matrix and B is & M x N matrix with {A)ij = aij and 
{B)ji — bji, respectively, x • Ay is the inner product 
between x and Ay and similarly for y • Bx: 



N AI 

X ■ Ay — ^ ^ ^ ^ OjjimXnym 
n— 1 m— 1 

M N 

y • Bx = X! X! bmnVniXn 
m—1 n—1 



(25) 



C. Collective Adaptation 

Generalizing to an arbitrary number of agents at this 
point should appear straightforward. It simply requires 
extending Eqs. (|19|l to a collection of adaptive agents. 
Suppose there are S agents labeled s = 1, 2, . . . , S" and 
each agent can take one of A^* actions. One describes the 



time evolution of the agents' state vectors in the simplices 
e Ai, x^ e A2, and x'^ G Ag. The adaptation 
dynamics in the higher-dimensional collective state space 
occurs within 

X- (x\x2,...,x^) e A = Ai X A2 X ...A5 . (26) 
Then we have the dynamics for collective adaptation as 



XI 
x' 



(3s{Rt.-R')+a,{HI.~H' 



(27) 



for i'' = 1, . . . , A^* and s = 1, . . . , S". Rl^ and i/f. are the 
reinforcement and the self-information for s to choose ac- 
tion i", respectively. Equations l(77|) constitute our gen- 
eral model for adaptation in agents collective. 

With three agents AT, F, and Z, with collective state 
space 

X = (x, y, z) e A = Ax X Ay X A^ . (28) 
one obtains: 

Xi 



Vj 

Zk 



^ = f3ziR^,-R^) + az[H^-H^], 

Zk 



(29) 



for i = 1, . . . , Af, j = 1, . . . , M, and fc = 1, . . . , L. The 
static environment version reduces to 



Xi 



Vi 



Zk_ 

Zk 



/3x[(^yz)i-x- Ayz] 



N 



ax [- log Xi + '^Xn log Xn] 



= /3y[(Szx)j -ySzx] 



M 



ay 



[- logyj + X! y™iog 



2/ mJ ) 



/3z[(Cxy)fc-z.Cxy] 



az [- log Zk+'^zi log zi] 



(30) 



for i = 1, . . . , A^, j = 1, . . . , M, and k — 1, . . . , L, and 
with tensors (A)^^ = a^fe, {B)jki = fojfci, [C)kij = Ckij. 
Here 



M L 

{Ayz)i = ^ y^^aimiymzi 

m=l Z=l 



and 



N M L 



x • Ayz = X! X! X! 



(31) 



(32) 



n— 1 m—1 / — I 



and similarly for y and Z. Note that the general model 
includes heterogeneous network settings with local inter- 
actions besides global interactions; see App. ^ 
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D. Evolutionary Dynamics and Game Theory 

We now interrupt the development to discuss the con- 
nections between the model developed this far and mod- 
els from population dynamics and game theory. There 
are interesting connections and also some important dis- 
tinctions that need to be kept in mind, before we can 
move forward. 

The special case that allows us to make contact with 
evolutionary dynamics and game theory is the restriction 
to agents with perfect memory interacting in a static 
environment. (For further details see App. [CI) In 
the two agent, static external environment case we set 
O.X = OLY = and equal adaptation rates, /?x = 
Under these assumptions our model, Eqs. (|^ . reduces 
to what is either called multipopulation replicator equa- 
tions or asymmetric game dynamics [T(I ITlL . The 
equations are: 

^ = (^y).-x.Ay , 

^ = (Bx),-ySx. (33) 

From the perspective of game theory, one regards the 
interactions determined by A and B, respectively, as X's 
and y's payoff matrices for a linear game in which X 
plays action i against Vs action j. Additionally, x and 
y, the agent state vectors, are interpreted as the mixed 
strategies. In fact, x • Ay and y • _Bx in Eqs. II33|I formally 
satisfy von Neumann-Morgenstern utilities [l^l ■ If they 
exist in the interior of the collective simplices Ax and 
Ay , interior Nash equilibria of the game (A, B) are the 
fixed points determined by the intersections of the x- and 
y-nuUclines of Eqs. (|33|l . 

One must be careful, though, in drawing parallels be- 
tween our general dynamic setting and classical game 
theory. In the idealized economic agents, it is often as- 
sumed that agents have knowledge of the entire game 
structure and of other agents' decision-making processes. 
Its central methodology derives how these rational play- 
ers should act. Our adaptive agents, in contrast, have 
no knowledge of a game in which they might be play- 
ing, only a myopic model of the environment and, even 
then, this is given only implicitly via the reinforcements 
the agents receive from the environment. In particular, 
the agents do not know whether they are playing a game 
or not, how many agents there are beyond themselves, 
or even whether other agents exist or not. Our model 
of dynamic adaptation under such constraints is appro- 
priate nonetheless for many real world adaptive systems, 
whether animal, human, or economic agent collectives 
p^ . The bi-matrix game {A, B) appears above as a de- 
scription of the collective's global dynamic only under 
the assumptions that the external environment changes 
very slowly. 

The connection with evolutionary dynamics is formal 
and comes from the fact that Eqs. (|33|l are the well 



known replicator equations of population dynamics j^]- 
However, the interpretation of the variables is rather dif- 
ferent. Population dynamics views x and y as two sep- 
arate, but interacting (infinite size) groups. These two 
populations are described as distributions of various or- 
ganismal phenotypes. The equations of motion deter- 
mine the evolution of these populations over generations 
and through interaction. In our model, in contrast, x 
and y represent the probability to choose actions for each 
agents. The equations of motion describe their dynamic 
adaptation to each other through interaction. 

Despite the similarities that one can draw in this spe- 
cial case, it is important to emphasize that our frame- 
work goes beyond the multipopulation replicator equa- 
tions and asymmetric game dynamics. First, the rein- 
forcement scheme R need not lead to linear interactions. 
Second, the model does not require a static environment 
described by a constant bi-matrix [A^B). Finally, the 
occurrence of the memory loss term is entirely new and 
not found in game theory or evolutionary dynamics. 

III. INFORMATION, UNCERTAINTY, AND 
DYNAMIC ADAPTATION 

We now shift away from a dynamical systems view and, 
as promised earlier, begin to think of the agent collective 
as a communication network. Although, this initially will 
appear unrelated, we will show that there is a close con- 
nection between the dynamical and information theoretic 
perspectives — connections that have both mathematical 
and pragmatic consequences. 

We consider the adaptive agents in the collective to 
be information sources. Each agent receives information 
from its environment, which includes other agents. Each 
agent interprets the received information and modifies 
its behavior accordingly, changing from x(t) to yi.{t-\-dt). 
Each agent generates a series of messages (actions) based 
on its updated internal model and introduces this new 
behavior back into the environment. This is a different 
interpretation of the interaction process in the collective 
which we motivated up to now only as a dynamical pro- 
cess. Now we discuss the adaptive dynamics from infor- 
mation theoretic viewpoint. 

A. Dynamics in Information Space 

In this section we introduce a new state space that di- 
rectly represents the uncertainties of agent actions. First, 
as before, for clarity we focus on the two-agent static- 
environment case, Eqs. (|24|l . Since the components of 
the agents' states are probabilities, the quantities 

= -loga;^ , 
rij = -logyj , (34) 

are the self-informations of agents X and Y choosing 
actions i and j, respectively. When Xi is small, for ex- 
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ample, the self-information is large since action i is 
rarely chosen by agent X. Consider the resulting change 
in coordinates in x R*^: 

3 = (^,^7) = X • (35) 

The normalization conditions — Y^^^iXn — '^m=iym — 
1 — that restrict the agent states to lie in simplices be- 
come E^=ie-«" = E^f^^e-''™ = 1 in H. 

In this space the equations of motion become: 

i = -(3x[{Ae-^), - e-^ ■ ^e"^] - ax[^^ - e'^ ■ |] , 

~(3Y[{Be-^)j - e-'n ■ Ber^] - ayiVj - e ■ r,] , 

(36) 



for i 

(e-«S 



1, ... ,7V and j = 1, . 

. ,e"'5") and e"^ = (e" 



. M and where e 

,...,6-""). 



Recall that both the Ai? interaction term and the Ai/ 
memory loss term are differences from means. This sug- 
gests yet another transformation to remove these com- 
parisons to the mean: 



N 

n=l 
M 



(37) 



m— 1 

with i = 1, . . . ,N and j — 1, . . . , M. This leads to the 
normalized space in x R*^: 

U = (u,v) (ui, . . . ,ujv) X (wi, . . . ,vm) , (38) 



with the constraints ^" ^ X]m=i "^"^ = 0. u and 

V are the normalized self-informations relative to their 
means. We refer to this space as information space. 
The combined coordinate transformation, Eq. H37|) 



composed with Eq. 
log-ratio coordinates |3C 



I, gives the well known centered 
. The inverse transformation is: 



(39) 



The resulting transformed adaptation equations di- 
rectly model the dynamics of uncertainties of agents' be- 
havior: 



X 



Ay 



V = -f3y 



N 

E 

n=l 
N 

n=l 



{Ay)r. 



(Bx)„ 



(40) 



When the interaction matrices are normalized to zero 



mean, Y.r. 



plify even further to 



— 'Ylin=\ ^ 0, the equations sim- 



u 

V 



-dxAy 



axn , 



(41) 



The origin O = (0, 0, . . . , 0) of the normalized informa- 
tion space U corresponds to random behavior: (x, y) — 
{1/N,...,1/N,1/M,...,1/M). The Shannon entropy of 
the choice distribution is maximized at this point. In 
contrast, when agents choose an action with probability 
1 the entropy vanishes and the agent state is located in 
A at the simplex vertices and in U at infinity. 

In Eqs. 1)41(1 the first term is related to information in- 
flux to an agent from outside; i.e., from other agents and 
the environment. The second term is related to the in- 
formation dissipation due to internal memory loss. Eqs. 
(|41|l are useful for theory, for analysis in certain limits, as 
we will shortly demonstrate, and for numerical stability 
during simulation, which we will illustrate when consider- 
ing example collectives below. Note that Eqs. H24() . Eqs. 
and Eqs. l|in|l are topologically orbit equivalent. 



B. Self-organization Induced by Dynamics of 
Uncertainty 

Equations (|40|l describe a dynamics of uncertainty be- 
tween deterministic and random behavior. Information 
influx occurs when the agents adapt to environmental 
constraints and accordingly change their choice distribu- 
tion. Information dissipation occurs when memory loss 
dominates and the agents increase their uncertainty to 
behave more randomly with less regard to the environ- 
mental constraints. The dissipation rate 7 of the dynam- 
ics in U is controlled entirely by the memory loss rate a: 



^ a- 

Eou„ 

n—1 



M 

E 



-Na 



X 



May 



(42) 



Therefore, Eqs. (|41|) are volume preserving in U when 
ax = ay = 0. 

In the case that agents behave without memory loss 
{ax — ay = 0), if the interaction specified by {A,B) is 
zero-sum, B = —A'^, and if, in addition, it determines 
an interior Nash equilibrium (x*,y*) (see App. 0, then 
the collective has a constant of motion: 



i? = /3^ii?(x* \\^)+py'D{y* II y) 



(43) 



where D{p \\ q) = SfcPfc \og(jik/qk) is the relative entropy 
or the information gain which measures the similarity 
between probability distributions p and q [U. (App. 
IDl gives the derivation of Eq. H43|) .'l Since the constant 
of motion E' is a linear sum of relative entropies, the 
collective maintains the information-theoretic distance 
between the interior Nash equilibrium and each agent's 
state. Thus, in the perfect memory case (a = 0), by 
the inequality D{p \\ q) > 0, the interior Nash equilib- 
rium cannot be reached unless the initial condition itself 
starts on it (Fig. [S} . This is an information-theoretic in- 
terpretation of the constant of motion noted in Ref. ^3| ■ 
Moreover, when N = M the dynamics has a symplectic 
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D(x*||x) D(y*||y) 



FIG. 5: Dynamics of zero-sum interaction without memory 
loss: Constant of motion E = /33f^_D(x* |1 x) +/3~^D(y* || y) 
keeps the linear sum of distance between the interior Nash 
equilibrium and each agent's state. 

structure in U with the Hamiltonian E given in Eq. 143|) 
p32] . In this case, Eqs. 14()|) are described quite simply, 

U = JVu^^ , (44) 
with a Poisson structure J 

J = _pT o ) with P = -PxPyA . (45) 

Again, see App. IdI 

When the bi-matrix interaction [A, B) satisfies B = 
, E is a Lyapunov function of dynamics and decreases 
to over time 0- In this case, each agents can adapt 
to environment independently and collective adaptation 
dynamics reach one of stable states. The Nash equilib- 
ria (x* , y* ) may not be in the interior of the collective 
simplices A. Note that symmetric neural networks have 
similar properties 33]. 

In some cases when neither B — —A^ nor B^A^,E 
increases non-monotonically, the dynamics in U diverges, 
and the Shannon entropies of agents' choice distribution 
asymptotically decreases. (See Figs. 1171 and EUl below.) 
Note that in single-agent adaptation with state x and 
normalizing the environment's reinforcements to a prob- 
ability distribution Pe, -D(pe || x) is always a Lyapunov 
function of the dynamics and decreases monotonically. 
In mutual adaptation, however, agents adapt to a dy- 
namic environment that includes the other agents. As a 
result, in some cases, E, a linear sum of agent relative 
entropies, will itself exhibit nontrivial dynamics and, in 
addition, the uncertainties of agents' choices will asymp- 
totically decrease. 

When agents adapt with memory loss (a > 0), the 
dynamics is dissipative. Since the memory loss terms 
induce information dissipation, the dynamics varies be- 
tween random and deterministic behavior in the informa- 
tion space. Notably, when the agents attempt to achieve 
this balance together by interacting and, in particular, 
when the interaction has nontransitive structure, the dy- 
namics can persistently wander in a bounded area in in- 
formation space. Since, in some cases, mutual adapta- 
tion and memory loss produce successive stretching and 
folding, deterministic chaos can occur with a significant 
range of a, even with only two agents. A schematic view 
of the flow in mutual adaptation is given in Fig. |3 



In the case that the agents are completely decoupled 
(or, in the case that B = A'^ and ax = cty = for two 
agents), information space locally splits into subspaces 
governed by effects of mutual adaptation (information 
influx) and memory loss (information dissipation) . They 
correspond to unstable and stable flow directions as in 
single agent adaptation. However, in the case that agents 
are coupled via nontransitive interaction, mutual adapta- 
tion and memory loss affects with each other and horse- 
shoe can be produced. Flow of information is multidi- 
mensional since each agent obtains information from its 
environment, organizes its behavior based on that infor- 
mation, and that local adaptation is then fed back into 
the environment affecting other agents. 

In this case, "weak" uncertainty of behavior plays an 
important role in organizing the collective's behavior. 
Small fluctuations in decision making can be amplified 
through repeated mutual adaptation with competitive 
interactions and dynamic memory stored in collectives 
could exist shown by a positive metric entropy. 




Adaptation and Memory loss Non-transitive interaction 



FIG. 6: Schematic view of mutual adaptation: Effect of mu- 
tual adaptation and memory loss produce unstable and stable 
directions. The nontransitive structure of interactions leads 
to state-space folding. 

Now consider many agents interacting. In the perfect 
memory case, when the game is zero-sum and has an in- 
terior Nash equilibrium (x^*,x^*, . . . ,x'^*), following Eq. 
(|43|l . the following constant of motion exists: 

s=l ^ s=l ^ \ri=^ = l " / 

(46) 

Although, strictly speaking, Hamiltonian dynamics and 
the associated symplectic structure of information space 
occurs only for two agents, one can describe multiple 
agent dynamics as a generalized Hamiltonian system |34| . 
In the general case with a > 0, dissipative dynamics and 
high-dimensional chaotic flows can give rise to several un- 
stable directions, since information influx has a network 
structure relative to the other agents. At least S stable 
directions are expected since memory loss comes from 
each individual's internal dynamics. 

Summarizing, in single-agent adaptation, information 
flows unidirectionally from the environment to the agent 
and the agent adapts its behavior to the environmental 
constraints. Adaptation leads to -D(pe || x) — > 0. For 
mutual adaptation in an agent collective, however, infor- 
mation flow is multidimensional since each agent obtains 
information from its environment that includes the other 
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agents. In this situation, E need not be a Lyapunov 
function for the dynamics. As we will see, when the dy- 
namics is chaotic, global information maximization is of 
doubtful utility and a dynamic view of adaptation shown 
in Fig. |S1 is more appropriate. When dynamic memory 
in collectives emerges, collective adaptation becomes a 
non-trivial problem. A detailed dynamical and informa- 
tion theoretic analysis along these lines will be reported 
elsewhere. 

In the next section, we will give several phenomeno- 
logical examples that captures collective adaptation. 



IV. EXAMPLES 

To illustrate collective adaptation, we now give several 
examples of the dynamics in a static environment with 
two and three agents interacting via versions of Match- 
ing Pennies and Rock-Scissors-Paper, games with non- 
transitive structures. App. El gives the details of the 
reinforcement schemes for these cases. The agents will 
have equal adaptation rates {(3x = Py — • • • ) and the 
same number of actions (iV ~ M = L = ■ ■ ■). In these 
simplified cases, the equations of motion for two agents 
are given by 



N 



= [(Ay)j - X • Ay] + ax [- log Xi + ^Xn log x„] 



71 = 1 

M 



Xi 
Xi 

— = - y • Bx] + arl- logy^ + ^ y,n logy™] , 

m=l 

(47) 

for i, j = 1, . . . , N . A detailed analysis of this case with 
zero memory loss (a ~ 0) is given in Ref. in terms of 
asymmetric game dynamics. We will present results for 
zero and positive memory loss rates. 

We then consider three agents, for which the adapta- 
tion equations are 



N 



— = [(^yz)i - x • Ayz] + ax[- logXi + ^ x„ loga;„] , 

n=l 
M 

= [(5zx)j - y • Bzx] + ay [- log yj + ^ y-m log ym] 



Zk 



m— 1 
L 

— = [(Cxy)fc - z ■ Cxy] + az[- logzfc + zilogzi] 
""^ 1=1 



Two Agents Adapting under Matching Pennies 
Interaction 



In the matching pennies game, agents play one of two 
actions: heads {H) or tail (T). Agent X wins when the 
plays do not agree; agent Y wins when they do. Agent 
A's state space is Ax = (xi, X2) with Xi e (0, 1) and xi + 
X2 — 1- That is, xi is the probability that agent X plays 
heads; X2, tails. Agent Y is described similarly. Thus, 
each agent's state space is effectively one dimensional and 
the collective state space A = Ax x Ay, two dimensional. 

The environment for two agents interacting via the 
matching pennies game leads to the following matrices 
for Eqs. (|?7jl : 



for i,j,k = 1, . . . ,N. We again will describe cases with 
and without memory loss. 

Computer simulations are executed in the information 
space U and the results are shown in the state space X. 
We ignore the dynamics on the boundary of the simplex 
and concentrate the case that all variables are greater 
than and less than 1. 



A = 


-fx 


ex 


and B — 


-€Y 






ex 


-fx 




fy 


-ey 



(49) 



where ex e (0.0, 1.0] and -ey € (0.0, 1.0]. 

Figure [7| shows a heteroclinic cycle of adaptation dy- 
namics on the boundary of A when the as vanish. Flows 
on the border occur only when agents completely ignore 
an action at the initial state; that is, when Xi(0) — 
or yj{0) = for at least one i or j. Each vertex of the 
simplex is a saddle since the interaction is non-transitive. 



(H,T) 



(T,T) 



(H,H) 



(T, H) 



FIG. 7: Flows on the boundary in Matching Pennies inter- 
action: Actions H and T correspond to "heads" and "tails", 
respectively. Arrows indicate the direction of adaptation dy- 
namics on the boundary of the state space A. 

The Nash equilibrium (x* , y* ) of the Matching Pennies 
game is in the center of A: (x*, y*) = (575:^)5) and this 
is also a fixed point of the adaptation dynamics. The 
Jacobian at (x*,y*) is 



J = 



ax 
2 



(l+log2) 



2 



ex 

-^(l + log2) 



(50) 



and its eigenvalues are 
4A, 



l + log2 



(ax + ay I 



(48) 



± y/iax - ayy + 4exey/(l + log 2)2 (.51) 



In the perfect memory case (ax — cty — 0), trajectories 
near (x*,y*) are neutrally stable periodic orbits, since 
Xi = ±^y/exey are pure imaginary. In the memory loss 
case (ax > and ay > 0), (x*,y*) is globally asymp- 
totically stable, since Re(Ai) and Re(A2) are strictly neg- 
ative. Examples of the trajectories in these two cases are 
given in Figure |H1 
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The Nash equihbrium of the Even-Odd interaction is 
(x*,y*,z*) = (i, i, i, i, i, i) at the center of A and this 
is also a fixed point of the adaptation dynamics. The 
Jacobian there is 



J = 



ax 











—ay 











-az 



(53) 



FIG. 8; Adaptation dynamics in Matching Pennies interac- 
tion: Here ex ~ 0.5 and ey = —0.3 with (left) ax = cey ~ 
and (right) ax = 0.02 and ay = 0.01. 



B. Three Agents Adapting under Even-Odd 
Interaction 



Now consider extending Matching Pennies for two 
agents so that it determines the interactions between 
three. Here we introduce the Even-Odd interaction in 
which there are again two actions, H and T, but agents 
win according to whether or not the number of heads in 
the group of three plays by the agents is even or odd. 
The environment now is given by, for agent X, 



ex, number of Hs is even 
—ex, otherwise 



(52) 



with actions for agents X, Y, and Z given by i,j,k — 
{H,T} and ex G (0.0, 1.0]. The interaction matrices bjki 
and Ckij for agents Y and Z , respectively, are given sim- 
ilarly but with ey e (0.0, 1.0] and ez G [-1.0,0.0). App. 
El gives the details of the reinforcement scheme. 

Following the reasoning used in Matching Pennies, the 
collective state space A — Ax x Ay x Az is now a solid 
three-dimensional cube. Figure shows a heteroclinic 
network of adaptation dynamics on the boundary of A 
when as vanish. Flows on A's boundary is shown in Fig. 

El 

A is partitioned into four prism-shaped subspaces. 
Each prism subspace has a heteroclinic cycle on the face 
that is also a face of A. 



(H, T, T) 



(H,H,T) 



(H, H, H) 




(T, T, T) 



T, H) X 



(T, H, H) 



FIG. 9: Flows on the state space boundary under the Even- 
Odd interactions: H and T correspond to "heads" and "tails" , 
respectively. Arrows indicate the direction of adaptation dy- 
namics on A's boundary when the as vanish. 



Its eigenvalues are A = ~ctx, —OLy, —az- Thus, in com- 
plete memory case {ax = oiY — olz — 0), trajecto- 
ries near (x*,y*,z*) are neutrally stable periodic orbits. 
With memory decay {ax,ay,az > 0), the (x*,y*,z*) 
is globally asymptotically stable. The hyperbolic fixed 
points in the top and bottom faces are unstable in all 
cases. Examples of the trajectories are given in Figure 

cni 

Notably, when a single agent (say, Z) has memory loss 
and others have perfect memory, the crossed lines given 
hy {z = X = 0.5, z = y = 0.5} become an invariant 
subspace and trajectories are attracted to points in this 
subspace. Thus, there are infinitely many neutrally sta- 
ble points. 

With ax = ay = and az — 0.01, for example, the 
adaptive dynamics alternates between a Matching Pen- 
nies interaction between agents X and Z by one between 
agents Y and Z during the transient relaxation to a point 
on the invariant subspace. 





FIG. 10: Dynamics of adaptation in the Even-odd interaction: 
ex = 0.5, ey — 0.2, and ez ~ —0.3 with ax ~ ay = az ~ 
in (left) and with ax = ay — and az ~ 0.01 in (right). 
The trajectories with several initial conditions are shown in 
(left). The neutral subspace is shown as the horizontal cross 
and the trajectory chosen illustrates the attraction to a point 
in this subspace in (right). 



C. Two Agents Adapting under 
Rock-Scissors-Paper Interaction 

In this subsection, we give an example of an environ- 
ment in which agents have three actions. One of the most 
commonly studied games with three actions is the Rock- 
Scissors-Paper (RSP) game, in which an agent playing 
Rock beats one playing Scissors, which in turn beats an 
agent playing Paper, which finally beats Rock. 
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First we examine two agents, which is a straightfor- 
ward implementation of the RSP game and then extend 
the RSP interaction to three agents and analyze the 
higher-dimensional behavior. The interaction matrices 
for these cases are given in App. El 

Under the RSP interaction each agent has the option 
of playing one of three actions: "rock" (R), 'scissors" (S), 
and "paper" (P). Agent X's probability of playing these 
are denoted xi, X2, and and a;i + 2:2 + 2^3 = 1- Agent 
Y probabilities are given similarly. Thus, the agent state 
spaces. Ax and Ay, are each two dimensional simplices, 
and the collective state space A = Ax x Ay is four 
dimensional. 

For two agents the environment is given by the inter- 
action matrices 



ex 1 -1 
-1 ex 1 
1 -1 ex 



and B = 



ey 1 -1 
-1 ey 1 
1 -1 ey 



, (54) 



where ex,^y G 
normalized to 



'1.0, 1.0] are the rewards for ties and 



A' 



2^ 
3 ex 



-1 - iex 
1- iex 



^ejc 
-1-i 



ex 



-1 - 

1 - 
2 



I ex 
3 ex 
ex 



and 



1 



jey 



jey 



-1 - 
1-i 



ey 



-1 



jey 



(55) 



(56) 



Note that the reinforcements are normalized to zero mean 
and that this does not affect the dynamics. 

The flow on A's boundary is shown in Fig. ^] This 
represents the heteroclinic network of adaptation dynam- 
ics on A's edges when the as vanish. Each vertex is a 
saddle since the interaction has non-transitive structure. 



(S, S) 



(P,S) 



(P, P) 




Y2 X2 



FIG. 11; Flows on the boundary of the simplex in the Rock- 
Scissors-Paper interaction for two agents: R, S, and P denote 
"rock", "scissors", and "paper", respectively. The arrows in- 
dicate the direction of the adaptation dynamics on the bound- 
ary of the collective state space A when the as vanish. 

The Nash equilibrium (x* , y* ) is given by the centers 
of the simplex: 



( * 1 1 i i 1^ 

IX ,y J 4'3'3'3'3'3^ 



(57) 



This is also a fixed point of the adaptation dynamics. 
The Jacobian there is 



J = 



/ 



V 



~ax 


J2 
3 





-ax 
2 



1+ex 


^ \ 






3 


3 


—ay 








-ay j 



(58) 



Its eigenvalues are 
2Ai = -(ax + ay) 



± 



(ax - ay)2 + 



4 exey-3± V-3(ex+ey)^ 



(59) 

Thus, when (A, B) is zero-sum (ex -I- ey = 0) and agents 
have complete memory (ax = ay = 0), trajectories 
near (x* , y* ) are neutrally stable periodic orbits since 
all A's are pure imaginary. The dynamics is Hamiltonian 
in this case. With memory decay (ax, ay > 0), and 
|ax - ay| < I (ex + 3), (x*,y*) is globally asymptoti- 
cally stable. 

For the nonzero-sum case, we will give examples of dy- 
namics with ex = 0.5, ey — —0.3, ay = 0.01. In this 
case, when ax > a^, (x*,y*) is globally asymptotically 
stable. At the point ac ~ 0.055008938, period-doubling 
bifurcation occurs. The example of two agents adapt- 
ing in the Rock-Scissors-Paper interaction adaptation dy- 
namics illustrates various types of low-dimensional chaos. 
We now explore several cases. 



1. Hamiltonian Limit 

When the agent memories are perfect (ax = OLy = 0) 
and the game is zero-sum (ex = — ey), the dynamics in 
the information space U is Hamiltonian with a function 
consists of relative entropy E = D(x* || x) -I- D(y* || y). 
The left columns of Figs. 1121 and [T^ give trajectories in 
the collective state space A, while the plots given in the 
middle and right columns are these trajectories projected 
onto the individual agent simplices. Ax and Ay. The 
trajectories were generated using a 4th-order symplectic 
integrator [s^ in U. 

When ex = — ey = 0.0 it appears that the dynamics is 
integrable since only quasiperiodic tori exist for almost 
all initial conditions in our computer simulation. With 
some initial conditions, the tori is knotted to form trefoil. 
Otherwise, when ex = — ey > 0.0, Hamiltonian chaos oc- 
curs with positive-negative pairs of Lyapunov exponents. 
(See Tabled) The game-theoretic behavior of this exam- 
ple was investigated briefly in Ref. '16] . The dynamics is 
very rich. For example, there are infinitely many distinct 
behaviors near the fixed point at the center — the interior 
Nash equilibrium — and a periodic orbit arbitrarily close 
to any chaotic one. 

A more detailed view of the complex dynamics is given 
in FigurelTHwhich shows Poincare sections of Eqs. H47|) 's 



(S, SI 




FIG. 12: Quasiperiodic tori: Collective dynamics in A (left 
column) and individual dynamics projected onto Ax and 
Ay respectively (right two columns). Here ex = — ey ~ 
0.0 and ax = ctY ~ 0. The initial condition is (A): 
(x,y) = (0.26,0.113333,0.626667,0.165,0.772549,0.062451) 
for the top and (B): (x,y) = (0.05,0.35,0.6,0.1,0.2,0.7) 
for the bottom. The constant of motion (Hamiltonian) is 
E = 0.74446808 = Eg. The Poincare section used for Fig. [HI 
is given by xi = X2 and t/i < j/2 and is indicated here as the 
straight diagonal line in agent X's simplex Ax- 
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FIG. 13: Quasiperiodic tori and chaos: Collective dynamics 
in A (left column) and individual dynamics projected onto 
Ax and Ay, respectively (right two columns). Here ex = 
— ey = 0.5 and ax = ay = 0. The initial conditions are 
the same as in Fig. 1121 (A) for top row and (B) for bottom 
rows, respectively. Also, the constant of motion is the same: 
E = Eq. The Poincare section is given by 3a;i — a;2 — 2/3 = 
and yi — 3t/2 +2/3 < and this is indicated as a straight line 
in Ax. 



trajectories. The Poincare section is given by its > and 
1)3 = 0. In (x, y) space the section is determined by the 
constraints: 



0.08, which clearly show Hamiltonian chaos. Note that 
A2 — 0.0, A3 ~ 0.0, and A4 ~ — Ai, as expected. 



(1 - 

(1 - ey)xi 



{I + ex)y2 + -^ex <0 , 



(1 + eY)x2 + -ey 







(60) 



These sections are indicated as the straight lines drawn 
in the Ax simplices of Figs. El and E| In Figure 1141 
when ex = —£y = 0.0, closed loops depending on the 
initial conditions exhibits tori in the Poincare section. 
When €x = — ey = 0.5, some tori collapse and become 
chaotic. The scatter of dots among the remaining closed 
loop shows characteristic Hamiltonian chaos. 

Table ^ shows Lyapunov spectra in U for dynamics 
with ex = — ey = 0.0 and ex = — ey = 0.5 with initial 
condition (x(0),y(0)) = (xi, 0.35, 0.65 - xi, 0.1, yz, 0.9 - 
2/2) with E = Eo = 0.74446808 fixed. {xi,y2) satisfies 



-3(£;o+21og3) 

0.035 



xi{0.65-xi)y2{0.9-y2). 



(61) 



When 0:1(0) = 0.05, the initial condition is (B): (x, y) — 
(0.05, 0.35, 0.6, 0.1, 0.2, 0.7), which we gave in the preced- 
ing examples. When ex = 0.5, the Lyapunov exponents 
indicate positive-negative pairs for xi(0) = 0.05, 0.06 and 




FIG. 14: Poincare sections of the behavior in the preceding 
two figures. That is, ex = — ey = 0.0 (left) and ex = — ey = 
0.5 (right). The Poincare section is given by xi — X2 and 
yi < y2 (left) and 3xi — X2 ~ 2/3 = and yi — 3y2 + 2/3 < 
(right). There are 25 randomly selected initial conditions, 
including the two, (A) and (B), used in Figs. ll2l and [T^ The 
constant of motion (E = £"0) forms the outer border of the 
Poincare sections. 
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A 


si(0)=0.05 


0.06 


0.07 0.08 0.09 0.10 




Ai 


+0.881 


+0.551 


+0.563 +0.573 +0.575 +0.589 


0.0 


A2 


+0.436 


+0.447 +0.464 +0.467 +0.460 +0.461 




Aa 


-0.436 


-0.447 


-0.464 -0.467 -0.460 -0.461 




A4 


-0.881 


-0.551 


-0.563 -0.573 -0.575 -0.589 




Ai 


+36.4 


+41.5 


+0.487 +26.3 +0.575 +0.487 


0.5 


A2 


+0.543 


+0.666 


+0.204 +0.350 +0.460 +0.460 




Aa 


-0.637 


-0.666 


-0.197 -0.338 -0.460 -0.467 




A4 


-36.3 


-41.5 


-0.494 -26.3 -0.575 -0.480 



TABLE I: Lyapunov spectra for different initial conditions 
(columns) and different values of the tie breaking parame- 
ter tx- The initial conditions are {xi,X2,xz,y\,y2,y-j,) — 
(a;i, 0.35, 0.65 - xi, 0.1, 1/2, 0.9 - ^2) with E = Eo = 
0.74446808 fixed. We choose the initial conditions {xi,y2) ~ 
(0.05,0.2), (0.06,0.160421), (0.07,0.135275), (0.08,0.117743), 
(0.09,0.104795), (0.10,0.0948432). The Lyapunov exponents 
are multiplied by 10^. Note that A2 ~ 0.0, A3 ~ 0.0 and 
A4 ~ — Ai as expected. The Lyapunov exponents indicating 
chaos are shown in boldface. 

2. Conservative Dynamics 

With perfect memory (ax = cty = 0) and a game that 
is not zero-sum (ex —ey) the dynamics is conservative 
in U and one observes transients that are attracted to 
heteroclinic networks in the state space X. (See Fig. 

m 
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FIG. 15: Heteroclinic cycle with ex = —0.1 and ey ~ 0.05 
(top row). Chaotic transient to a heteroclinic network (bot- 
tom row) with ex = 0.1 and ey = —0.05). For both 
ax = ciy ~ 0. 

When ex + £y < 0, the behavior is intermittent and 
orbits are guided by the flow on A's edges, which de- 




FIG. 16: Time series of action probabilities during the hete- 
roclinic cycles of Fig. 1151 ex ~ —0.1 and ey = 0.05 for the 
left column. The right column shows the chaotic transient to 
a possible heteroclinic cycles when ex ~ 0-1 and ey — —0.05. 
For both ax ~ ay = 0. 




□ t sroo □ J. 500D 



FIG. 17: Dynamics of H''^ , and E in conservative adap- 
tive dynamics: ex ~ —0.1 and ey = 0.05 for the left plot 
and ex ~ 0.1 and ey — —0.05 for the right. For both 
ax ~ ay — 0. Note that E increases asymptotically and 
iif'^ and tend to decrease. 



scribes a network of possible heteroclinic cycles. Since 
action ties are not rewarded there is only one such cy- 
cle. It is shown in the top row of Fig. (|15|) : {R,P) 
iS,P) iS,R) ^ {P,R) ^ (P,5) ^ {R,S) ^ (i?,P). 
Note that during the cycle each agent switches between 
almost deterministic actions in the order i? — > S* — )■ P. 
The agents are out of phase with respect to each other 
and they alternate winning each turn. 

With ex + ey > 0, however, the orbit is an infinitely 
persistent chaotic transient 's^. Since, in this case, agent 
X can choose a tie, the cycles are not closed. For exam- 
ple, with ex > 0, at (R, P), X has the option of moving 
to (P, P) instead of {S, P) with a positive probability. 
This embeds an instability along the heteroclinic cycle 
and so orbits are chaotic. (See Fig. ^1 bottom row.) 

Figure ^| shows the time series for these behaviors. 
Usually, in transient relaxation to heteroclinic cycle, the 
duration over which orbits stay near saddle vertices in- 
creases exponentially. However, for our case, it appears 
to increase subexponentially. This is because of the very 
small exponent; (1 + 15)" ^ l + nS + . . . {S « 1). In the 
second chaotic transient case, it still increases subexpo- 
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nentially, but the visited vertices change irregularly. 

Figure [T7I shows the behavior of ^ , and E. For 
both cases E eventually increases monotonically and 
and asymptotically decrease. The agents show a ten- 
dency to decrease choice uncertainty and to switch be- 
tween almost deterministic actions. and oscillate 
over the range [0,log2] for ex = —0.1 and ey = 0.05 and 
over [0,log3] for ex = 0.1 and ey = —0.05. 



3. Dissipative Dynamics 

If the memory loss rates {ax and ay) are positive, 
the dynamics becomes dissipative in information space 
U and exhibits limit cycles and chaotic attractors. (See 

Fig. CHI) 
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FIG. 18: Dissipative adaptive dynamics: Stable limit cycle for 
ax = 0.025 (top), ax = 0.021 (middle) and chaotic attractors 
with ax ~ 0.0198 (bottom). All cases have ex = 0.5, ey = 
—0.3 and ay = 0.01. Period-doubling bifurcation to chaos 
occurs with decreasing ax- 

Figure^] (top) shows a diverse range of bifurcations as 
a function of ax- It shows the dynamics on the surface 
specified by ua < and — projected onto V3- The 
fixed point (x*,y*) becomes unstable when ax is larger 










04 





0.01 



0.O2 



0.03 



FIG. 19: Bifurcation diagram (top) of dissipative dynamics 
(adapting with memory loss) projected onto coordinate V3 
from the Poincare section (U3 > 0, — 0) and the largest 
two Lyapunov exponents Ai and A2 (bottom) as a function 
of ay e [0.01,0.03]. Here with ex = 0.5, ey = -0.3 and 
ay = 0.01. Simulations show that A3 and A4 are always 
negative. 




In3 
In2 



FIG. 20: Dynamics oi H , H , and E in dissipative adaptive 
dynamics: ex ~ 0.5, ey = —0.3, and ay = 0.01 for both. 
ax ~ 0.025 for the left plot and ax ~ 0.01 for the right. 
t* ~ 10* in the right figure is the (rather long) transient time. 
In both cases E does not diverge due to memory loss. 



than ac ~ 0.055008938. Typically, period-doubhng bi- 
furcation to chaos occurs with decreasing ax- Chaos 
can occur only when ex + ey > ^3 ■ 

Figure 1201 shows dynamics of , , and E in dissi- 
pative adaptive dynamics. For both cases shown E does 
not diverge due to memory loss. When ax = 0.025, 
and converge to oscillations over the range 
[log 2, log 3]. When ax = 0.01, and exhibit 
chaotic behavior over the range [0,log3]. 

Figure ^| (bottom) shows that the largest Lyapunov 
exponent in U is positive across a significant fraction of 
the parameter space; indicating that chaos is common. 
The dual aspects of chaos, coherence and irregularity, 
imply that agents may behave cooperatively or competi- 
tively (or switch between both). This ultimately derives 
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from agents' successive mutual adaptation and memory 
loss in non-transitive interactions, such as in the RSP 
game; as was explained in Sec. IIIII Note that such 
global behavior organization is induced by each agents' 
self-interested and myopic adaptation and "weak" uncer- 
tainty of their environment. 



D. Three Agents Adapting under 
Rock-Scissors-Paper Interaction 

Consider three agents adapting via (an extension of) 
the RSP interaction. Here the environment is given by 
the following interaction 



2 Win over the others. 
—2 Lose to the other two. 
dijk = '\ 1 Win over one other. (62) 
— 1 Lose to one other. 
^ ex Tie. 

and similarly for bjki and Ckij^ with A; = {i?, 5, P}. 
Here ex,ey,ez G (-1.0,1.0). (See App. [Elfor the de- 
tailed listing of the reinforcement scheme.) As before we 
use normalized a' t, and cl 



(S. S, S) 



ijk 



^ 5 
^ 5 

hx 



Win over the others. 
Lose to the other two. 
Win over one other. 
Lose to one other. 
Tie. 



(63) 



The normalization does not affect the dynamics. 

The Nash equilibrium (x*, y*, z*) is at the simplex cen- 
ter: 

i * * *^ ,111111111, 

(x,y,z ) = (-,-,-,-,-,-,-,-,-). (64) 

It is also a fixed point of the adaptation dynamics. The 
Jacobian there is 



/ 



J = 
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_1 
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-az ) 



(65) 
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FIG. 21: Flows on the simplex edges in three-agent RSP: 
Arrows indicate the direction of adaptation dynamics on A's 
boundary when the as vanish. 




When ax ~ ay = = a, its eigenvalues are 



FIG. 22: Periodic orbit (top: ex = 0.5, ey = -0.365, 
ez ~ 0.8) and chaotic orbit (bottom: ex = 0.5, ey ~ —0.3, 
ez = 0.6) with the other parameters are ax ~ ay = 
az = 0.01. The Lyapunov spectrum for chaotic dynamics is 
(Ai,...,A6) = (-H45.2, +6.48, -0.336, -19.2, -38.5, -53.6) x 
10"^ 



A, H- a = -^(-1,-1,-2,1,1,2) 



(66) 
= 0), 



In the perfect memory case {ax — ay — az 
trajectories near (x*, y*, z*) are neutrally stable periodic 
orbits, since the As are pure imaginary. In the memory 



loss case {ax,ay,az > 0), (x*,y*,z*) is asymptotically 
stable, since all Re(Ai) are strictly negative. One expects 
multiple attractors in this case. 

The collective state space A is now 6 dimensional, be- 
ing the product of three two-dimensional agent simplices 
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A = Ax X Ay X A^. The flow on A's boundary is shown 
in Fig. 1211 giving the adaptation dynamics on the edges 
of A when the as vanish. 

We give two examples with ax = oiy = az — 0.01, 
ex — 0.5, ey — —0.365, ez = 0.8 (top: Umit cy- 
cle) and €x = 0.5, ey = —0.3, ez = 0.6 (bottom: 
chaos) in Fig. |221 Chaos is typically observed when 
ejf + ey + ez > 0. Limit cycles are highly com- 
plex manifolds depending on the 6-dimensional hetero- 
clinic network on the simplex boundary. The Lyapunov 
spectrum for the chaotic dynamics is (Ai,...,Ag) = 
{+4:5.2, -t-6.48, -0.336, -19.2, -38.5, -53.6) x lO'^. The 
dynamics has two positive Lyapunov exponents. Note 
that this dynamics could have many neutrally stable sub- 
spaces in three or more dimensions. These subspaces act 
as quasistable attractors and may even have symplectic 
structure. These properties of high-dimensional dynam- 
ics will be reported elsewhere. 



V. CONCLUDING REMARKS 

We developed a class of dynamical systems for collec- 
tive adaptation. We started with very simple agents, 
whose adaptation was a dynamic balance between adap- 
tation to environmental constraints and memory loss. A 
macroscopic description of a network of adaptive agents 
was produced. In one special case we showed that the 
dynamical system reduces to replicator equations, famil- 
iar in evolutionary game theory and population biology. 
In a more general setting, we investigated several of the 
resulting periodic, intermittent, and chaotic behaviors in 
which agent-agent interactions were explicitly given as 
game interactions. 

Self-organization induced by information flux was dis- 
cussed using an information-theoretic viewpoint. We 
pointed out that unlike single-agent adaptation, infor- 
mation flow is multidimensional in collective adaptation 
and that global information maximization is of doubtful 
utility and a dynamic view of adaptation is more ap- 
propriate. We also noted that only with two agents via 
nontransitive interactions, horseshoe in the information 
space can be produced due to the agents' local adapta- 
tion which amplifies fluctuations in behavior and to mem- 
ory loss stabilizing behavior. Since deterministic chaos 
occurs even in this simple setting, one expects that in 
higher-dimensional and heterogeneous adaptive systems 
intrinsic unpredictability would become a dominant col- 
lective behavior. When dynamic memory stored in collec- 
tives emerges, collective adaptation becomes a non-trivial 
problem. A detailed information theoretic and dynamical 
systems theoretic analysis will be reported elsewhere. 

We close by indicating some future directions in which 
to extent the model. 

First, as we alluded to during the development, there 
are difficulties of scaling the model to large numbers of 
agents. We focused on collectives with global coupling 
between all agents. However, in this case, the complexity 



of interaction terms grows exponentially with number of 
agents, which is both impractical from the viewpoints 
of analysis and simulation, and unrealistic for natural 
systems that are large collectives. The solution to this, 
given in App. is to develop either spatially distributed 
agents collectives or to extend the equations to include 
explicit communication networks between agents. Both 
of these extensions will be helpful in modeling the many 
adaptive collectives noted in the introduction. 

Second, important for applications, is to develop the 
stochastic generalization of the deterministic equations of 
motion which accounts for the effects of finite and fluc- 
tuating numbers of agents and also finite histories for 
adaptation. Each of these introduces its own kind of 
sampling stochasticity and will require a statistical dy- 
namics analysis reminiscent of that found in population 
genetics js^- It is also important to consider the effects 
of asynchrony of adaptive behavior in this case. 

Third, one necessary and possibly difficult extension 
will be to agents that adapt continuous-valued actions — 
say, learning the spatial location of objects — to their en- 
vironments. Mathematically, this requires a continuous- 
space extension of the adaptation equations (Eq. (^3) 
and this results in models that are described by PDEs 

M 

Finally, another direction, especially useful if one at- 
tempts to quantify global function in large collectives, 
will be structural and information-theoretic analyses of 
local and global adaptive behaviors js^ Analyz- 
ing the stored information and the causal architecture 
plL in each agent versus that in the collective, com- 
munication in networks, and emerging hierarchical struc- 
tures in collective adaptation are projects now made pos- 
sible using this framework. 



APPENDIX A: CONTINUOUS TIME 

Here we give the derivation of the continuous-time lim- 
its that lead to the differential equations from the original 
stochastic discrete-time adaptation model. 

Denote the agent-agent interaction time scale, number 
of interactions per adaptation interval, and adaptation 
time scale as dr, T, and t, respectively. We assume that 
adaptation is very slow compared to agent-agent interac- 
tions and take the limits dr and T — > cx), keeping 
dt ~ Tdr finite. Then we take the limit dt ^ to get 
the derivative of the vector Q"^ (t) . 

With Eq. (HH) and Qf (0) = 0, we have 
T r M 

Qf{T) = j;J2 Y.^^-^^^y^r,Ak)-axQf{k) . 

k=l [rn=l 

(Al) 

Thus, for continuous-time, when action i is chosen by X 
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at step t, 

Qf(t + dt)-Qf{t) 



dt 

T(t+dt) 



— y 

Tdt ^ 



k=Tt 



M 



J2 S^^yf^i^) - axQf i^, 



(A2) 



Taking T —^ oo and dr 0, we have 
Qfit + dt)~Qfit) 



dt 



1 

di 



ax 



t+dt 



M 



ds 



dt 



t+dt 



{s)ds 



(A3) 



Assuming rfj {t) changes as slowly as the adaptive dy- 
namics, r^j{t) is constant during the adaptation interval 
t ^ t + dt. If we assume in addition that the behaviors 
of two agents X and Y are statistically independent at 
time t, then the law of the large numbers gives 



t+dt 



M 



dt Jt 

M 

Y,nm{t)yra{t) = Rf{t) . 



ds 



(A4) 



Now take dt — > 0. Eqs. and 1A4|I together give 

Qf{t)^Rf{t)-axQf{t), (A5) 

for the continuous-time updating of the reinforcement 
memory. When environment is static given as {t) — 
ttij, then 



N 



Rf{t) = Y.a„,y,{t) 



(A6) 



The single-agent case is given by letting y 
(1, 0, 0, . . . , 0) fixed and a^i = Ci, i = 1, . . . , N . 



APPENDIX B: NETWORK INTERACTIONS 

We can describe heterogeneous network interactions 
within our model. We give an example of a model for lat- 
tice interactions here. Agents s = 1, 2, . . . , S" are on a spa- 
tial lattice: agent s interacts with agent s — 1 through bi- 
matrices (A*,_B*^^) and agent s + 1 through [B'^^A^^^). 
Each bi-matrix is 2 x 2. See Fig. ESI 

Agents choose actions among the 2x2 action pairs 
for both the right and left neighboring agents. The ac- 
tion pairs are (1, 1), (1, 2), (2, 1), (2, 2) and are weighted 




FIG. 23: Agent s interacts with agent s — 1 through bi- 
matrices {A\B''~^) and agent s + l through (_B^ 



with probabilities xi, . . . , X4^. Inserting the interaction bi- 
matrices into the S-agent adaptive dynamics of Eq. 127|) 
gives 

4 = /34(A^x«-i),-p^-A^x«-i 



4 

a,(-loga;f -^<logx^) , (Bl) 

n=l 

where Sxf — 1 and p"* — [x\ -f a;|,a;| -f q" = 
(xf + x%^,X2 + x\). In a similar way, arbitrary network 
interactions can be described by our adaptive dynamics 
given in Eqs. (P7|l. 



APPENDIX C: NASH EQUILIBRIA 

The Nash equilibria (x* , y* ) of the bi-matrix game 
(A, B) are those states in which all players can do no 
better by changing state; that is. 



xMy* > x^y* and y*Bx* > yBx* 



(CI) 



for all (x, y) G Ax x Ay. If they exist in the interior, 
the solutions of the following simultaneous equations are 
Nash equilibria: 

[Ay), = (Ay)i and (Sx), = (Sx)i 

^ (Ay), - xAy = (Bx), - ySx = , (C2) 

where S^^i^n = S^^;^j/„ = 1. 

It is known that = M is a necessary condition for 
the existence of a unique Nash equilibrium in the interior 
of A. With A*" = M in the perfect memory case {ax = 
ay = 0), the unique Nash equilibrium, if it exists, is 
the fixed point given by the intersection of the x- and 
y-nuUchnes of Eqs. if^ . 

This Nash equilibrium is not asymptotically stable, but 
the time average of trajectories converges to it. To see 
this, suppose that Xi(t) > S for all t sufRciently large, we 
have 



d_ 
di 
d_ 
lit 



(logxi) 
(logj/j) 



(Ay), - xAy , 
(Bx), - ySx 



(C3) 
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Integrating the both sides from to T and dividing by 

T, we get 

logx,(r)-loga:,(0) _ ^ _ „ 

m— 1 

N 

bjnXn - Sb , (C4) 

n=l 



logy,(T)-log2/j(0) 



where 



T ^ / Xidt and Uj ^ T ^ yjdt , (C5) 
Jo Jo 



and 



Sa=T-^ I xAydt and Sb =T-^ [ yB^dt . (C6) 
Jo Jo 

Letting T ^ oo, the left-hand sides converge to 0. Thus, 
X and y are a solution of Eqs. 1C2|I . (This proof follows 
Rcf. 43].) 



APPENDIX D: HAMILTONIAN DYNAMICS 

Consider a game (A, B) that admits an interior Nash 
equilibrium (x* , y* ) G Ax x Ay , and is zero-sum [B — 
-A^), then 

E = f3x'D{^* II x) + l3y'D{y* \\ y) (Dl) 

is a constant of the motion. This follows by direct calcu- 
lation: 



,„ N . M 

dE *^ Lv^ *^ 

dt " [3x ^"'"x™ pY ^/"y™ 

n— 1 rn— 1 ^ 

= -(x*Ay-xAy)-(y*Bx-yBx) 

= (x* - x)A(y* - y) + (y* - y)i?(x* - x) 

= . (D2) 

This holds for any number of agents. Give the agents 
equal numbers of actions {N = M) and set a to zero 
(perfect memory) and make all /3s finite and positive. 
Then the adaptive dynamics is Hamiltonian in the in- 
formation space U = (u, v) with the above constant of 
motion E^ 



U = J\/uE 
with Poisson structure J, 



(D3) 



J=[ ^pTg] ^itli P = -PxPyA . (D4) 



Proof : 



dE _ d 
dui du 



N 



N 



Px^ < log xl+l3y^Y^ yl log yl 

n— 1 n— 1 

/3^i(5]x>„-log(5]e-'-)) 

n— 1 n—1 
/ N N \ 



\n=l 



X " 7=;]v ~ ~ -^i) (1-*^) 



dE 



(D6) 



Since (x* , y* ) is an interior Nash equilibrium, with Eq. 
(EH, {Ay*y, - (Bx*), = 0. Thus, 



dE 1 
9u 



(D7) 



and 



JV-uE = 



-PxPyA^ 



dv 



-{-f3xf3YAr^ 



-f3xAy 
-fSyBx 



U 



(D8) 



We can transform U = (u, v) to canonical coordinates 
U' = (p,q): 



U' = SVu,E , 



with 



-I 

1 o 



(D9) 



(DIG) 



where I is an N x N identity matrix and with a linear 
transformation U' = MXJ to the Hamiltonian form. □ 



APPENDIX E: REINFORCEMENT SCHEMES 
AND INTERACTION MATRICES 

Here we give the reinforcement scheme interaction ma- 
trices for the constant-environment collectives investi- 
gated in Sec. IIVI 



1. Matching Pennies 

This game describes a non-transitive competition. 
Each agent chooses a coin, which turns up either heads 
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(H) or tails (T). Agent X wins when the coins differ, 
otherwise agent Y wins. Table ^ gives the reinforce- 
ment scheme for the various possible plays. Note that 
the es determine the size of the winner's rewards. When 
ex + ey = 0, the game is zero-sum. The Nash equilibrium 
is x*=y* = (l/2,l/2). 

Various extensions of Matching Pennies to more than 
two players are known. We give the Even- Odd game as 
an example for three agents X , Y , and Z in a collective. 
All flip a coin. Agents X and Y win when the number 
of heads is even, otherwise Z wins. Table IIIII gives the 
reinforcement scheme. When the es add to zero, the game 
is zero-sum. The unique mixed Nash equilibrium is x* = 
y* = z* = (i, i, i) — the simplex center. 



X Y 






H H 


-ex 


-ey 


H T 


ex 


ey 


T H 


ex 


ey 


T T 


-ex 


-ey 



TABLE II: The two-person Matching Pennies game: ex € 
(0.0,1.0] and ey G [-1.0,0.0). 



X Y Z 








H H H 


—ex 


-ey 


-ez 


H H T 


ex 


ey 


ez 


H T H 


ex 


ey 


ez 


H T T 


—ex 


— ey 


-ez 


T H H 


ex 


ey 


ez 


T H T 


—ex 


— ey 


-ez 


T T H 


—ex 


— ey 


-ez 


T T T 


ex 


ey 


ez 



TABLE III: The three-player Even-Odd game: ex G (0.0, 1.0] 
and ey,ez G [-1.0,0.0). 

2. Rock- Scissors-Paper 

This game describes a non-transitive three-sided com- 
petition between two agents: rock (R) beats scissors (S), 



scissors beats paper (P), but paper beats rock. Table 
IIVI gives the reinforcement scheme. The es here con- 
trol the rewards for ties. When they add to zero, the 
game is zero-sum. The unique mixed Nash equilibrium 
is X* = y* = [\,\, \) — again, the center of the simplex. 

The extension of RSP interaction to three agents is 
straightforward. The reinforcement scheme is given in 
Table FVl When + ey + ez = 0, the game is zero-sum. 
The Nash equilibrium is x* = y* = z* = (1/3, 1/3, 1/3). 



X Y 






R R 


ex 


ey 


R S 


1 


-1 


R P 


-1 


1 


S R 


-1 


1 


S S 


ex 


ey 


S P 


1 


-1 


P R 


1 


-1 


P S 


-1 


1 


P P 


ex 


ey 



TABLE IV: The two-person Rock-Scissors-Paper game: 
ex,ey G (-1.0,1.0). 



X Y Z 








X Y Z 








X Y Z 








R, R, R, 


ex 


ey 


ez 


s 


R R 


-2 


1 


1 


P R R 


2 


-1 


-1 


R R S 


1 


1 


-2 


s 


R S 


-1 


2 


-1 


P R S 


ex 


ey 


ez 


R R P 


-1 


-1 


2 


s 


R P 


ex 


ey 


ez 


P R P 


1 


-2 


1 


R S R 


1 


-2 


1 


s 


S R 


-1 


-1 


2 


P S R 


ex 


ey 


ez 


R S S 


2 


-1 


-1 


s 


s s 


ex 


ey 


ez 


P s s 


-2 


1 


1 


RSP 


ex 


ey 


ez 


s 


S P 


1 


1 


-2 


P S P 


-1 


2 


-1 


RPR 


-1 


2 


-1 


s 


P R 


ex 


ey 


ez 


P P R 


1 


1 


-2 


R P S 


ex 


ey 


ez 


s 


P S 


1 


-2 


1 


P P S 


-1 


-1 


2 


R P P 


-2 


1 


1 


s 


P P 


2 


-1 


-1 


P P P 


ex 


ey 


ez 



TABLE V: The 3-person Rock-Scissors-Paper game: 
ex,ey,ez G (-1.0, 1.0). 
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